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METHOD FOR TUNING VOICE PLAYBACK RATIO 
TO OPTIMIZE CALL QUALITY 



Field of the Invention 
The present invention relates generally to the field of communication 
systems, and more particularly, to a method fortuning voice playback ratio 
to optimize call quality. 

10 

Background of the Invention 
In a communications system, jitter is a term used to describe variation 
in interpacket arrival times. A jitter buffer is a digital storage device used to 
compensate for a difference in the rate of flow of information or the time of 
15 occurrence of events when transmitting information from one device to 
D another. The jitter buffer approximates a first-in-first-out (FIFO) with a 

jE variable input rate and a constant output rate. In a typical communication 

Sj system, a jitter buffer typically operates as follows. When a first packet 

5 arrives at the receiver's side, the packet is placed in the jitter buffer. The 

J a 20 receiver then starts a timer. For voice, the timer value is typically a fixed 
111 number on the order of 100 ms to 200 ms. The timer value is called the 

length of the jitter buffer. When the timer expires, the receiver reads the 
packet from the buffer and uses it. The receiver then sets a recurring timer. 
The interval of the recurring timer matches the nominal duration of each voice 
25 packet. As the following packets arrive, the receiver places them in the jitter 
buffer. As the timer expires, the reader reads the next packet from the buffer. 
If a packet has not arrived by the time the receiver attempts to read it from the 
buffer, the packet is counted as lost. 

Internet Protocol (IP) networks are designed to carry primarily real-time 
30 data. As such, voice data may experience significant delay, jitter and loss 
when crossing IP networks. Most current technologies use dynamic jitter 
buffer algorithms to compensate for the difference in the rate of network flow 
regardless of the current network conditions. United States Patent No. 
5,790,538 ('538 patent) issued to Gary Sugar on August 4, 1998, describes a 
35 method of tuning jitter buffer size. However, the method of the'538 patent 
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does not account for cognitive effects such as listener perception and the 
effects of loss on the coder-decoder (CODEC). 

Thus, there is a need for an apparatus and method for adjusting the 
jitter buffer size according to network conditions that addresses the 
5 drawbacks of the prior art. 

Brief Description of the Drawings 
FIG. 1 is a block diagram of the preferred embodiment of the 
apparatus of the present invention. 
10 FIG. 2 is a graph of the conversion between R-factor and Mean 

Opinion Score (MOS) that can be used to estimate impairment factors for 
CODECS not included in ETR-250. 

FIG. 3 is a graph of impairment factor I dd for various values of E-E 

Delay. 

15 FIG. 4 is a graph of impairment factor Ie loss for various values of 

average jb hss . 

FIG. 5 is a graph of playback time for various values of jitter buffer 

length. 

FIG. 6 is a graph of jitter buffer overflow for various values of pbt and 
20 average delay. 

FIG. 7 is a graph of impairment factor I pbr for various values of pbr . 
FIG. 8 is a table of R-factors for various combinations of jb 0 and pbr . 
FIG. 9 is a flow chart of the preferred embodiment of the method of the 
present invention. 

25 FIG. 10 is a flow chart of the preferred embodiment of step 904 in the 

flow chart of FIG. 9. 

FIG. 11 is a flow chart of the preferred embodiment of step 1008 in the 
flow chart of FIG. 10. 



30 Detailed Description of the Drawings 

The present invention provides an apparatus and method for tuning 
voice playback ratio to optimize call quality in a packet voice communications 
system, while taking into account network conditions. In particular, the 
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invention optimizes jitter buffer length for call quality. Between bursts of 
speech, the invention controls jitter buffer length by varying the initial jitter 
buffer length ( jb 0 ) and the playback ratio (pbr ). During bursts of speech, 
jb Q is measured and the pbr is varied. The pbr is the ratio of resampling 
5 rate to the original sampling rate. The invention is useful in networks having 
moderate to high jitter. Such networks typically have high packet loss ratios 
(fraction of packets lost from a stream by a network due to errors or 
congestion) and high end-to-end delays (amount of time between a speaker 
producing a sound and a listener hearing the sound). In the preferred 
10 embodiment, the invention causes speech that is stored in the buffer to be 
played back slower than normal. This allows the system to start with a short 
jitter buffer and grow the jitter buffer as needed to improve voice quality. A 
shorter initial jitter buffer reduces end-to-end delay, 
a Referring to FIG. 1 , the preferred embodiment of the apparatus 100 of 

=p 15 the present invention is shown. In the present invention, the voice decoder 
J 1 04 controls the rate at which bits (voice data) are removed from the jitter 

*"M buffer 102. This allows the jitter buffer 102 to vary dynamically between 

jj, bursts of speech (InterBOS) and during bursts of speech (IntraBOS). The 

voice decoder 104 is coupled to a voice resampler 106. The voice resampler 
Ci 20 106 controls the number of Pulse Code Modulation (PCM) bits per second 

2 coming out of the voice decoder 1 04, and consequently, the rate at which the 

voice decoder 1 04 removes bits from the jitter buffer 102. The voice 
resampler 106 accomplishes this by resampling the bit stream from the voice 
decoder 104 to higher or lower bit rates. This has the effect of speeding up or 
25 slowing down the speech that the listener eventually hears. The jitter buffer 
102, voice decoder 104 and voice resampler 106 are implemented in 
software and are commonly known in the art. 

The preferred embodiment of the present invention utilizes a new 
element, a playback optimizer 108, which is coupled to the jitter buffer 102 
30 and voice resampler 106. IntraBOS, the playback optimizer 108 gathers 
statistics on the status of the communication link (e.g. transmission delay, 
packet loss, jitter buffer effects, etc.), estimates the resulting call quality and 
updates the voice resampler to move the call quality closer to optimum. 
InterBOS, the playback optimizer 108 resets the length of the jitter buffer 102 
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and the initial playback ratio of the voice resampler 106. The playback 
optimizer 108 selects the new values based on simulations of the previous 
BOS with alternative initial jb 0 and pbrs . The playback optimizer 108 is 
implemented in software on any computer or processor commonly known in 
5 the art. 

In order to take listener perception into account, the invention uses 
Section 9.2 of Transmission and Multiplexing™ ; Speech Communication 
Quality From Mouth to Ear for 3.1 kHz Handset Telephony Across Networks 
(ETR-250) . Sophia Antipolis, Valbonne France, 1996. ETR-250 describes a 
10 method of mapping network characteristics to customer satisfaction ratings 

called the "e-model." The ETR-250 e-model is used in the method of the 
present invention to estimate customer satisfaction with the quality of a voice 
call in real time. The e-model seeks to convert each impairment in a 
telephone call into a score on a psychological scale. The effects on the 
% 15 psychological scale are additive. Units on the psychological scale are called 
Impairment Factors (IFs) and an overall score on the scale is an R-factor 
(R ). The apparatus and method of the present invention develops a revised 
form of the e-model equation: 
2 R=R o-K-Je loss -Ie pbr -Ie DD . (1) 

fl 20 (Equation (1 ) includes only those quantities that are pertinent to the present 
I* invention.) R 0 represents in principle the basic signal-to-noise ratio (SNR) of 

the voice transmission at the 0 dBr point nearest side. Ie c represents the 
impairment due to encoding with a specific CODEC. ETR-250 provides a 
table with values for various CODECs. One may also use the Mean Opinion 
25 Score (MOS) conversion in the graph of FIG. 2 to estimate IFs for CODECs 
not included in ETR-250. As known in the art, the MOS is an estimation of 
customer satisfaction on a scale of 1 (worst) to 5 (best). Ie DD is the 
impairment due to a high absolute end-to-end delay (delay on the link plus 
any delay due to jitter). The present invention introduces new elements Ie loss 
30 and Ie pbr into the ETR-250 e-model equation. Ie loss describes the behavior of 
a specific CODEC under conditions of frame loss. The present invention 
works best with CODECs that have a high tolerance for frame loss. However, 
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the invention also works with loss-sensitive CODECS. Ie pbr is the impairment 
due to variations in speech reproduction rate. The apparatus and method of 
the present invention has the ability to playback speech at a slower than 
normal rate. 

5 In order to improve call quality by adjusting the jitter buffer size 

according to networks conditions, the present invention is concerned with 
three network elements that affect packet voice networks - delay, jitter and 
loss. The graph of FIG. 3 shows the relationship between end-to-end delay 
and IF. FIG. 3 can be obtained from Figure 52 (Impairment Factor I DD as a 

10 function of the absolute one-way transmission time) of ETR-250 and formulas 
9.1.34, 9.1.35 and 9.1.36, which are herein incorporated by reference. As 
shown in FIG. 3, very small delays, those less than 150 ms, have no 
measurable effect on the listener's perception of call quality. As delay 
increases, the effect becomes steadily more noticeable. Once delays 

15 become large, small changes no longer have much effect. The preferred 

embodiment of the apparatus and method of the present invention uses FIG. 
3 to obtain the IF i DD for a given value of end-to-end delay. 

The effects of the second network element, loss, are specific to the 
particular CODEC used in the network. In the preferred embodiment of the 

20 present invention, a PCM CODEC is used. In accordance with ETR-250, the 
graph of FIG. 4 is an approximation of the effects of loss on IF Ie loss for a PCM 
CODEC. The graph can be determined by running MOS experiments as 
described in Section 2.5 (Opinion Tests) of the Handbook on 
Telephonometrv . ITU-T (CCITT), Geneva 1992, which is incorporated herein 

25 by reference. The graph is also based on Perceptual Speech Quality 

Measure (PSQM) scores which are described in P. 861 Objective Quality 
Measurement of Telephone-band (300-3400 Hz) speech codecs (02/98), 
which is incorporated herein by reference. As shown in FIG. 4, a PCM 
CODEC degrades fairly linearly until around 40%. 

30 The third network element, jitter, describes the variations in intervals 

between packets. A jitter buffer, such as jitter buffer 102 in FIG.1, removes 
jitter by converting it into either of the two previously described network 
elements - delay or loss. Details of the conversion will now be discussed. A 
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jitter buffer converts jitter into delay by holding onto packets for a predictable 
amount of time. The graph of FIG. 5 illustrates this concept. The graph 
shows the amount of delay induced by jitter buffers of different lengths. For 
illustrative purposes, the average transmission delay (amount of time for 
5 transmission between a sender and a receiver) is 200 ms. In this case, the 
jitter buffer adds 200 ms to each packet so that all packets experience the 
same end-to-end delay. For example, 200 ms is added to packets in a jitter 
buffer of length 200 ms to produce a playback time (pbt) of 400 ms; 200 ms is 
added to packets in a jitter buffer of length 400 ms to produce a pbt of 600 

10 ms; and so on. 

When a packet arrives too late to play out of the jitter buffer, the jitter 
buffer converts jitter into loss. The pattern of loss depends heavily upon the 
pattern of the jitter. For ease of illustration and discussion, the graph of FIG. 
6 assumes normal distribution of jitter around the average delay. As will be 

15 recognized by one of ordinary skill in the art, many tools can be used to make 
a record of the actual jitter distributions on the network. The graph of FIG. 6 
illustrates a network with 1 a of jitter at 200 ms. For various pbts, the graph 
plots jitter buffer overflow versus average delay. Different length jitter buffers 
effectively integrate the normal distribution from negative infinity to a 

20 particular time past the average delay. The delay due to the jitter buffer is 
combined in the graph with all other delays to yield a playback time. 

During a burst of speech, the length of the jitter buffer cannot be 
modified. Such a modification could cause a discontinuity in the output 
speech in the form of a pause or missing speech, for example. Instead, 

25 phase-continuous changes are made to the jitter buffer. In accordance with 
the preferred embodiment of the present invention, these phase continuous 
changes are accomplished by adjusting the pbr . A pbr of 0.8 means that 0.8 
seconds of encoded speech plays out of the jitter buffer as 1 second of output 
speech. A pbr of 1 is the most accurate reproduction of the original signal. 

30 Empirical analysis has shown that if the pbr is less than 1 .0, the jitter buffer 
grows throughout the burst of speech. If the pbr is greater than 1.0, the jitter 
buffer shrinks throughout the burst of speech until it reaches a length of 0 ms. 
The pbr is itself an impairment. FIG. 7 estimates the IF Ie pbr 6ue to pbr . 
The graph of FIG. 7 can be determined by running MOS experiments as 
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described in Section 2.5 (Opinion Tests) of the Handbook on 
Telephonometrv . ITU-T (CCITT), Geneva 1992. 

Given a set of network conditions (delay, jitter and loss), the preferred 
embodiment of the apparatus and method of the present invention undergoes 
5 an iterative process to determine the optimum values for the control variables 
A and pbr that will yield the best R-factor. The table of FIG. 8 includes the 
optimum values for jb 0 and pbr (values that yield the highest R-factor) and a 
few points surrounding the optimum values for measured network conditions: 
delay = 150 ms; jitter = 100 ms; and loss = .04. To illustrate the principles of 
10 the invention, two iterations of the process using values of jb 0 and pbr in the 

table of FIG. 8 will be described with reference to the flow charts of FIGs. 9 - 

l;A 1 1 . 

O Referring to FIG. 9, the preferred embodiment of the method of the 

present invention, first measures current network conditions (step 902). In 
J 15 the current example, the measured network conditions are: delay =150 ms, 
yB jitter = 100 ms and loss = 4%. The example also assumes a 2000 ms BOS. 

Given the values of delay, jitter and loss determined in step 902, the method 
Z determines values of jb 0 and pbr that yield the highest R (as defined in 

Mj equation (1 ) previously herein) (step 904). In the preferred embodiment of 

6 20 the Present invention, R is determined in accordance with the flowchart of 
FIG. 10. At step 1002, the method begins with an initial value for jb 0 
and pbr . For the first iteration in the current example, the initial jb 0 is 56.5 
and the initial pbr is 1. (These values, as shown in the table of FIG. 8, are 
not necessarily the first values chosen by the method, but rather are used for 
25 illustrative purposes only.) At step 1004, the method determines R 0 . For 

simplicity of explanation, the current example assumes an ideal system where 
R 0 is 100. Section 9.1.3.2 of ETR-250, which is herein incorporated by 
reference, provides an explanation of how to calculate R 0 for a less than ideal 
system. At step 1006, the method determines Ie c . In the preferred 
30 embodiment of the apparatus of the present invention, the voice decoder 104 
is a PCM decoder. The impairment factor ie c for a PCM decoder is 1 . At 
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step 1008, the method determines the impairment factor Ie loss . In the 
preferred embodiment, Ie loss is determined according to the flowchart of FIG. 
12. 

Referring to FIG. 12, the first step in determining Ie loss is determining an 
initial pbt (pbt at the beginning of a BOS) according to the equation: 

initial pbt = jb 0 + delay . (2) 
In the current example, the initial pbt is equal to 56.5 + 150 = 206.5 ms. For 
an initial pbt of 206.5 ms and a delay of 150 ms, the method determines the 
initial jitter buffer overflow (step 1 104), preferably using the graph of FIG. 6. 
As shown in the graph, the initial jitter buffer overflow is 0.21 . At step 1 106, 
the method uses the initial jitter buffer overflow to determine an initial jitter 
buffer loss Gbioss) according to the equation: 

initial jb loss = 1 - [(1 - loss) x (1 - initial jitter buffer overflow)] . (3) 
In the current example, the initial jb| OSS is 1-[(1-.04) x (1-.21)] which equals 
0.24. Next, at step 1 1 08, the method calculates the gain in the jitter buffer 
length during a BOS according to the equation: 

gain in jitter buffer length = (1 - pbr) x BOS. (4) 
In the current example, the gain is (1-1) x 2000 which is 0. (This should be 
the case for a pbr of 1 since 1 second of encoded speech plays out of the 
jitter buffer as 1 second of output speech.) At step 1110, the method 
determines the final pbt according to the equation: 

final pbt = jb 0 + delay + gain in jitter buffer length. (5) 
In the current example, the final pbt is equal to 56.5 + 150 + 0 = 206.5 ms. 
For a final pbt of 206.5 ms and a delay of 150 ms, the method determines the 
final jitter buffer overflow (step 1112), preferably using the graph of FIG. 6. 
As shown in the graph, the final jitter buffer overflow is the same as the initial 
jitter buffer overflow, which is 0.21 . At step 1114, the method calculates the 
final jb| 0SS according to the equation: 

final jb hss = 1 - [(1 - loss) x (1 - final jitter buffer overflow)] . (6 ) 
In the current example, the final jb| 0SS is 1-[(1-.04) x (1-.21)] which equals 0.24. 
At step 1116, the method calculates the average jb, oss according to the 
equation: 
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average jb loss = (initial jb loss + final jb loss )/2. (7) 
In the current example, the average jb| 0SS is (.24 + .24)/2 which is .24. Using 
this value of average jb| OSS , the method determines impairment factor 
Ie loss (step 1118), preferably using the graph of FIG. 4. As shown in the 
graph, for an average jb, oss of .24, Ie loss is 32. 

Referring back to FIG. 10, after determining at step 1008, the 
method determines impairment factor I pbr {s\ep 1010). Preferably, I pbr is 
determined from the graph of FIG. 7. As shown, for a pbr of 1 , I pbr is 0. At 
step 1012, the method determines impairment factor I dd . First, the method 
determines the end-to-end delay according to the equation: 

E-E delay = jb 0 + delay. (8) 
In the current example, the end-to-end delay is 56.5 + 150 = 206.5 ms. Using 
this value of end-to-end delay, the method determines impairment factor I dd , 
preferably using the graph of FIG. 3. As shown, for an end-to-end delay of 
206.5 ms, I dd is 3.72. At step 1014, the method calculates that for jb 0 = 56.5 
and pbr = 1 , R=R 0 -le c -Ie loss -Ie pbr -Ie DD = 1 00 - 1 - 32.5 - 0 - 3.72 = 62.8. 
This result is shown in the table of FIG. 8 (62.76). At step 1016, the method 
determines whether the optimum value of R has been achieved. If the 
answer is yes, the method ends (step 1020) and the values of jb 0 and pbr 
that yield the highest R has been found. If the answer is no, the method 
changes the values of jb 0 and/or pbr and repeats steps 1004 through 1014 
to calculate a new value of R . 

Turning now to the second illustrative iteration of the method, at step 
1018 the method sets jb 0 to 1 13 and pbr to 1. (These values, as shown in 
the table of FIG. 8, are not necessarily the second values chosen by the 
method, but rather are used for illustrative purposes only.) At step 1004, the 
method determines that R 0 is 100. At step 1006, the method again 
determines that impairment factor Ie c is 1 for a PCM decoder. At step 1008, 
the method determines the impairment factor Ie loss , preferably according to 
the flowchart of FIG. 11. 
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Referring to FIG. 1 1 , at step 1 1 02 the method determines an initial pbt of 263 
{initial pbt = jb Q + delay = 113 + 1 50). For an initial pbt of 263 ms and a delay 
of 150 ms, the method determines the initial jitter buffer overflow (step 1 104), 
preferably using the graph of FIG. 6. As shown in the graph, the initial jitter 
buffer overflow is 0.055. At step 1 106, the method uses the initial jitter buffer 
overflow to determine an initial jb| OSS of 0.0928 

{initial jb loss = 1 - [(1 - loss) x (1 - initial jitter buffer overflow)] = 1-[(1-.04) X (1- 
.055)]). Next, at step 1 108, the method calculates a gain in the jitter buffer 
length of 86 ( gain in jitter buffer length = (1 - pbr) x BOS = (1 -.957) x 2000). At 
step 1110, the method determines a final pbt of 349 ms 
( final pbt = jb 0 + delay + gain in jitter buffer length = 1 1 3 + 1 50 + 86). For a 
final pbt of 349 ms and a delay of 150 ms, the method determines the final 
jitter buffer overflow (step 1112), preferably using the graph of FIG. 6. As 
shown in the graph, the final jitter buffer overflow is 0.002. At step 1 1 14, the 
method calculates a final jb loss of .042 

( final jb loss = 1 - [(1 - loss) x (1 - final jitter buffer overflow)] = 1 -[(1 -.04) x (1 - 
0.002)]). At step 1116, the method calculates an average jb, oss of .068 
{average jb loss = {initial jb loss + final jb loss )/2 = (.0928 + .042)/2 ). Using this 
value of average jb, oss , the method determines impairment factor Ie loss (step 
1118), preferably using the graph of FIG. 4. As shown in the graph, for an 
average jb loss of .068, Ie loss is 10.3. 

Referring back to FIG. 10, after determining 7e tej atstep 1008, the 
method determines impairment factor /^(step 1010). Preferably, / is 
determined from the graph of FIG. 7. As shown, for a pbr of .957, I pbr is .14. 
At step 1012, the method determines impairment factor I dd . First, the 
method determines and end-to-end delay of 263 {E — E delay = jb 0 + delay = 
113 + 150). Using this value of end-to-end delay, the method determines 
impairment factor I di , preferably using the graph of FIG. 3. As shown, for an 
end-to-end delay of 263 ms, I dd is 10.5. At step 1014, the method calculates 
that for jb 0 = 1 13 and pbr = .957, R=R 0 -le c -Ie hss -Ie pbr -Ie DD =100-1-10.3 
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- .14 - 10.5 = 78.06. This result is shown in the table of FIG. 8 with slight 
variation due to rounding errors (78.04). 

Between bursts of speech, the preferred embodiment of the invention 
optimizes call quality by varying the initial jb 0 and the pbr to achieve the 
best R . During bursts of speech, the value of jb 0 is measured at the time of 
recalculating R and the pbr is varied to achieve the best R . The method can 
be during burst of speech and between bursts of speech whenever the 
network conditions change. 

While the invention may be susceptible to various modifications and 
alternative forms, a specific embodiment has been shown by way of example 
in the drawings and has been described in detail herein. However, it should 
be understood that the invention is not intended to be limited to the particular 
forms disclosed. Rather, the invention is to cover all modification, equivalents 
and alternatives falling within the spirit and scope of the invention as defined 
by the following appended claims. 
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